Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 29674 |
| Missing cells | 13 |
| Missing cells (%) | < 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.2 MiB |
| Average record size in memory | 112.0 B |
Variable types
| NUM | 12 |
|---|---|
| BOOL | 2 |
Reproduction
| Analysis started | 2020-05-24 15:33:18.415132 |
|---|---|
| Analysis finished | 2020-05-24 15:34:33.208095 |
| Version | pandas-profiling v2.6.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
AGE is highly correlated with AGE_GROUP | High Correlation |
AGE_GROUP is highly correlated with AGE | High Correlation |
LIMIT_BAL is highly correlated with LIMIT_BAL_GROUP | High Correlation |
LIMIT_BAL_GROUP is highly correlated with LIMIT_BAL | High Correlation |
PAY_AMT2 is highly skewed (γ1 = 30.31213063) | Skewed |
BILL_AMT5 has 3211 (10.8%) zeros | Zeros |
PAY_AMT1 has 4931 (16.6%) zeros | Zeros |
PAY_AMT2 has 5075 (17.1%) zeros | Zeros |
PAY_AMT3 has 5647 (19.0%) zeros | Zeros |
PAY_AMT4 has 6087 (20.5%) zeros | Zeros |
PAY_AMT5 has 6382 (21.5%) zeros | Zeros |
PAY_AMT6 has 6852 (23.1%) zeros | Zeros |
| Distinct count | 29674 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14957.007514996292 |
|---|---|
| Minimum | 0 |
| Maximum | 30000 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1483.65 |
| Q1 | 7451.25 |
| median | 14936.5 |
| Q3 | 22454.75 |
| 95-th percentile | 28490.35 |
| Maximum | 30000 |
| Range | 30000 |
| Interquartile range (IQR) | 15003.5 |
Descriptive statistics
| Standard deviation | 8662.794748 |
|---|---|
| Coefficient of variation (CV) | 0.5791796747 |
| Kurtosis | -1.200223035 |
| Mean | 14957.00751 |
| Median Absolute Deviation (MAD) | 7502.011839 |
| Skewness | 0.005745492805 |
| Sum | 443834241 |
| Variance | 75044012.84 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 30000.], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 11535 | 1 | < 0.1% | |
| 23841 | 1 | < 0.1% | |
| 17698 | 1 | < 0.1% | |
| 19747 | 1 | < 0.1% | |
| 29988 | 1 | < 0.1% | |
| 25894 | 1 | < 0.1% | |
| 27943 | 1 | < 0.1% | |
| 5416 | 1 | < 0.1% | |
| 7465 | 1 | < 0.1% | |
| Other values (29664) | 29664 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 30000 | 1 | < 0.1% | |
| 29999 | 1 | < 0.1% | |
| 29998 | 1 | < 0.1% | |
| 29997 | 1 | < 0.1% | |
| 29996 | 1 | < 0.1% |
GRADUATE
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Memory size | 232.0 KiB |
| 0 | |
|---|---|
| 1 | |
| (Missing) | 1 |
| Value | Count | Frequency (%) | |
| 0 | 19249 | 64.9% | |
| 1 | 10424 | 35.1% | |
| (Missing) | 1 | < 0.1% |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.4628113099450677 |
|---|---|
| Minimum | 1.0 |
| Maximum | 8.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 7 |
| Maximum | 8 |
| Range | 7 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.791121383 |
|---|---|
| Coefficient of variation (CV) | 0.5172448692 |
| Kurtosis | -0.6465529551 |
| Mean | 3.46281131 |
| Median Absolute Deviation (MAD) | 1.510043384 |
| Skewness | 0.5046278967 |
| Sum | 102752 |
| Variance | 3.208115809 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2 | 7050 | 23.8% | |
| 3 | 5728 | 19.3% | |
| 4 | 4845 | 16.3% | |
| 1 | 3833 | 12.9% | |
| 5 | 3577 | 12.1% | |
| 6 | 2381 | 8.0% | |
| 7 | 1988 | 6.7% | |
| 8 | 271 | 0.9% | |
| (Missing) | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 3833 | 12.9% | |
| 2 | 7050 | 23.8% | |
| 3 | 5728 | 19.3% | |
| 4 | 4845 | 16.3% | |
| 5 | 3577 | 12.1% |
| Value | Count | Frequency (%) | |
| 8 | 271 | 0.9% | |
| 7 | 1988 | 6.7% | |
| 6 | 2381 | 8.0% | |
| 5 | 3577 | 12.1% | |
| 4 | 4845 | 16.3% |
| Distinct count | 21010 |
|---|---|
| Unique (%) | 70.8% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40753.74441411384 |
|---|---|
| Minimum | -81334.0 |
| Maximum | 927171.0 |
| Zeros | 3211 |
| Zeros (%) | 10.8% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | -81334 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1981 |
| median | 18413 |
| Q3 | 50673 |
| 95-th percentile | 166749 |
| Maximum | 927171 |
| Range | 1008505 |
| Interquartile range (IQR) | 48692 |
Descriptive statistics
| Standard deviation | 60984.2006 |
|---|---|
| Coefficient of variation (CV) | 1.496407299 |
| Kurtosis | 12.20015426 |
| Mean | 40753.74441 |
| Median Absolute Deviation (MAD) | 41395.08864 |
| Skewness | 2.862787937 |
| Sum | 1209285858 |
| Variance | 3719072723 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 3211 | 10.8% | |
| 390 | 233 | 0.8% | |
| 780 | 94 | 0.3% | |
| 316 | 79 | 0.3% | |
| 326 | 62 | 0.2% | |
| 150 | 57 | 0.2% | |
| 396 | 45 | 0.2% | |
| 416 | 36 | 0.1% | |
| 2500 | 34 | 0.1% | |
| 2400 | 31 | 0.1% | |
| Other values (21000) | 25791 | 86.9% |
| Value | Count | Frequency (%) | |
| -81334 | 1 | < 0.1% | |
| -61372 | 1 | < 0.1% | |
| -53007 | 1 | < 0.1% | |
| -46627 | 1 | < 0.1% | |
| -37594 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 927171 | 1 | < 0.1% | |
| 823540 | 1 | < 0.1% | |
| 587067 | 1 | < 0.1% | |
| 551702 | 1 | < 0.1% | |
| 547880 | 1 | < 0.1% |
| Distinct count | 7943 |
|---|---|
| Unique (%) | 26.8% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5725.735180130085 |
|---|---|
| Minimum | 0.0 |
| Maximum | 873552.0 |
| Zeros | 4931 |
| Zeros (%) | 16.6% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1000 |
| median | 2157 |
| Q3 | 5025 |
| 95-th percentile | 18610.2 |
| Maximum | 873552 |
| Range | 873552 |
| Interquartile range (IQR) | 4025 |
Descriptive statistics
| Standard deviation | 16643.64462 |
|---|---|
| Coefficient of variation (CV) | 2.906813552 |
| Kurtosis | 411.5383141 |
| Mean | 5725.73518 |
| Median Absolute Deviation (MAD) | 5959.618873 |
| Skewness | 14.60546767 |
| Sum | 169899740 |
| Variance | 277010906.2 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 4931 | 16.6% | |
| 2000 | 1362 | 4.6% | |
| 3000 | 891 | 3.0% | |
| 5000 | 698 | 2.4% | |
| 1500 | 507 | 1.7% | |
| 4000 | 426 | 1.4% | |
| 10000 | 401 | 1.4% | |
| 1000 | 363 | 1.2% | |
| 2500 | 298 | 1.0% | |
| 6000 | 294 | 1.0% | |
| Other values (7933) | 19502 | 65.7% |
| Value | Count | Frequency (%) | |
| 0 | 4931 | 16.6% | |
| 1 | 9 | < 0.1% | |
| 2 | 14 | < 0.1% | |
| 3 | 15 | 0.1% | |
| 4 | 18 | 0.1% |
| Value | Count | Frequency (%) | |
| 873552 | 1 | < 0.1% | |
| 505000 | 1 | < 0.1% | |
| 493358 | 1 | < 0.1% | |
| 423903 | 1 | < 0.1% | |
| 405016 | 1 | < 0.1% |
| Distinct count | 7899 |
|---|---|
| Unique (%) | 26.6% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5986.2915782024065 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1684259.0 |
| Zeros | 5075 |
| Zeros (%) | 17.1% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1000 |
| median | 2037 |
| Q3 | 5000 |
| 95-th percentile | 19233.6 |
| Maximum | 1684259 |
| Range | 1684259 |
| Interquartile range (IQR) | 4000 |
Descriptive statistics
| Standard deviation | 23159.08083 |
|---|---|
| Coefficient of variation (CV) | 3.868685734 |
| Kurtosis | 1625.729375 |
| Mean | 5986.291578 |
| Median Absolute Deviation (MAD) | 6522.828405 |
| Skewness | 30.31213063 |
| Sum | 177631230 |
| Variance | 536343024.7 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 5075 | 17.1% | |
| 2000 | 1290 | 4.3% | |
| 3000 | 857 | 2.9% | |
| 5000 | 717 | 2.4% | |
| 1000 | 594 | 2.0% | |
| 1500 | 521 | 1.8% | |
| 4000 | 410 | 1.4% | |
| 10000 | 318 | 1.1% | |
| 6000 | 283 | 1.0% | |
| 2500 | 251 | 0.8% | |
| Other values (7889) | 19357 | 65.2% |
| Value | Count | Frequency (%) | |
| 0 | 5075 | 17.1% | |
| 1 | 15 | 0.1% | |
| 2 | 20 | 0.1% | |
| 3 | 18 | 0.1% | |
| 4 | 11 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1684259 | 1 | < 0.1% | |
| 1227082 | 1 | < 0.1% | |
| 1215471 | 1 | < 0.1% | |
| 1024516 | 1 | < 0.1% | |
| 580464 | 1 | < 0.1% |
| Distinct count | 7518 |
|---|---|
| Unique (%) | 25.3% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5283.145283591143 |
|---|---|
| Minimum | 0.0 |
| Maximum | 896040.0 |
| Zeros | 5647 |
| Zeros (%) | 19.0% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 444 |
| median | 1880 |
| Q3 | 4600 |
| 95-th percentile | 18000 |
| Maximum | 896040 |
| Range | 896040 |
| Interquartile range (IQR) | 4156 |
Descriptive statistics
| Standard deviation | 17695.15307 |
|---|---|
| Coefficient of variation (CV) | 3.349359543 |
| Kurtosis | 558.9924934 |
| Mean | 5283.145284 |
| Median Absolute Deviation (MAD) | 5907.410461 |
| Skewness | 17.13796108 |
| Sum | 156766770 |
| Variance | 313118442.2 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 5647 | 19.0% | |
| 2000 | 1285 | 4.3% | |
| 1000 | 1103 | 3.7% | |
| 3000 | 870 | 2.9% | |
| 5000 | 721 | 2.4% | |
| 1500 | 490 | 1.7% | |
| 4000 | 381 | 1.3% | |
| 10000 | 312 | 1.1% | |
| 1200 | 243 | 0.8% | |
| 6000 | 241 | 0.8% | |
| Other values (7508) | 18380 | 61.9% |
| Value | Count | Frequency (%) | |
| 0 | 5647 | 19.0% | |
| 1 | 13 | < 0.1% | |
| 2 | 19 | 0.1% | |
| 3 | 14 | < 0.1% | |
| 4 | 15 | 0.1% |
| Value | Count | Frequency (%) | |
| 896040 | 1 | < 0.1% | |
| 889043 | 1 | < 0.1% | |
| 508229 | 1 | < 0.1% | |
| 417588 | 1 | < 0.1% | |
| 400972 | 1 | < 0.1% |
| Distinct count | 6937 |
|---|---|
| Unique (%) | 23.4% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4879.136959525495 |
|---|---|
| Minimum | 0.0 |
| Maximum | 621000.0 |
| Zeros | 6087 |
| Zeros (%) | 20.5% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 324 |
| median | 1502 |
| Q3 | 4100 |
| 95-th percentile | 16361.2 |
| Maximum | 621000 |
| Range | 621000 |
| Interquartile range (IQR) | 3776 |
Descriptive statistics
| Standard deviation | 15744.04332 |
|---|---|
| Coefficient of variation (CV) | 3.226809055 |
| Kurtosis | 274.6837041 |
| Mean | 4879.13696 |
| Median Absolute Deviation (MAD) | 5569.795488 |
| Skewness | 12.84474608 |
| Sum | 144778631 |
| Variance | 247874900.2 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 6087 | 20.5% | |
| 1000 | 1394 | 4.7% | |
| 2000 | 1214 | 4.1% | |
| 3000 | 887 | 3.0% | |
| 5000 | 810 | 2.7% | |
| 1500 | 441 | 1.5% | |
| 4000 | 402 | 1.4% | |
| 10000 | 341 | 1.1% | |
| 2500 | 259 | 0.9% | |
| 500 | 258 | 0.9% | |
| Other values (6927) | 17580 | 59.2% |
| Value | Count | Frequency (%) | |
| 0 | 6087 | 20.5% | |
| 1 | 22 | 0.1% | |
| 2 | 22 | 0.1% | |
| 3 | 13 | < 0.1% | |
| 4 | 20 | 0.1% |
| Value | Count | Frequency (%) | |
| 621000 | 1 | < 0.1% | |
| 528897 | 1 | < 0.1% | |
| 497000 | 1 | < 0.1% | |
| 432130 | 1 | < 0.1% | |
| 400046 | 1 | < 0.1% |
| Distinct count | 6897 |
|---|---|
| Unique (%) | 23.2% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4852.153607656793 |
|---|---|
| Minimum | 0.0 |
| Maximum | 426529.0 |
| Zeros | 6382 |
| Zeros (%) | 21.5% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 315 |
| median | 1544 |
| Q3 | 4100 |
| 95-th percentile | 16220.6 |
| Maximum | 426529 |
| Range | 426529 |
| Interquartile range (IQR) | 3785 |
Descriptive statistics
| Standard deviation | 15353.94255 |
|---|---|
| Coefficient of variation (CV) | 3.164356241 |
| Kurtosis | 178.3092098 |
| Mean | 4852.153608 |
| Median Absolute Deviation (MAD) | 5518.671954 |
| Skewness | 11.07463591 |
| Sum | 143977954 |
| Variance | 235743551.9 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 6382 | 21.5% | |
| 1000 | 1340 | 4.5% | |
| 2000 | 1323 | 4.5% | |
| 3000 | 947 | 3.2% | |
| 5000 | 814 | 2.7% | |
| 1500 | 426 | 1.4% | |
| 4000 | 401 | 1.4% | |
| 10000 | 343 | 1.2% | |
| 500 | 250 | 0.8% | |
| 6000 | 247 | 0.8% | |
| Other values (6887) | 17200 | 58.0% |
| Value | Count | Frequency (%) | |
| 0 | 6382 | 21.5% | |
| 1 | 21 | 0.1% | |
| 2 | 13 | < 0.1% | |
| 3 | 13 | < 0.1% | |
| 4 | 12 | < 0.1% |
| Value | Count | Frequency (%) | |
| 426529 | 1 | < 0.1% | |
| 417990 | 1 | < 0.1% | |
| 388071 | 1 | < 0.1% | |
| 379267 | 1 | < 0.1% | |
| 332000 | 1 | < 0.1% |
| Distinct count | 6939 |
|---|---|
| Unique (%) | 23.4% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5272.854177198126 |
|---|---|
| Minimum | 0.0 |
| Maximum | 528666.0 |
| Zeros | 6852 |
| Zeros (%) | 23.1% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 200 |
| median | 1500 |
| Q3 | 4032 |
| 95-th percentile | 17543.6 |
| Maximum | 528666 |
| Range | 528666 |
| Interquartile range (IQR) | 3832 |
Descriptive statistics
| Standard deviation | 17866.70954 |
|---|---|
| Coefficient of variation (CV) | 3.388432325 |
| Kurtosis | 165.4893385 |
| Mean | 5272.854177 |
| Median Absolute Deviation (MAD) | 6246.817918 |
| Skewness | 10.58823937 |
| Sum | 156461402 |
| Variance | 319219309.8 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 6852 | 23.1% | |
| 1000 | 1299 | 4.4% | |
| 2000 | 1295 | 4.4% | |
| 3000 | 914 | 3.1% | |
| 5000 | 808 | 2.7% | |
| 1500 | 439 | 1.5% | |
| 4000 | 411 | 1.4% | |
| 10000 | 356 | 1.2% | |
| 500 | 247 | 0.8% | |
| 6000 | 220 | 0.7% | |
| Other values (6929) | 16832 | 56.7% |
| Value | Count | Frequency (%) | |
| 0 | 6852 | 23.1% | |
| 1 | 20 | 0.1% | |
| 2 | 9 | < 0.1% | |
| 3 | 14 | < 0.1% | |
| 4 | 12 | < 0.1% |
| Value | Count | Frequency (%) | |
| 528666 | 1 | < 0.1% | |
| 527143 | 1 | < 0.1% | |
| 443001 | 1 | < 0.1% | |
| 422000 | 1 | < 0.1% | |
| 403500 | 1 | < 0.1% |
| Distinct count | 7 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.2573046203619453 |
|---|---|
| Minimum | 1.0 |
| Maximum | 7.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.827953897 |
|---|---|
| Coefficient of variation (CV) | 0.5611860449 |
| Kurtosis | -1.36848069 |
| Mean | 3.25730462 |
| Median Absolute Deviation (MAD) | 1.62979723 |
| Skewness | 0.1682842791 |
| Sum | 96654 |
| Variance | 3.34141545 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 7607 | 25.6% | |
| 5 | 4997 | 16.8% | |
| 2 | 4784 | 16.1% | |
| 6 | 4300 | 14.5% | |
| 4 | 3915 | 13.2% | |
| 3 | 3864 | 13.0% | |
| 7 | 206 | 0.7% | |
| (Missing) | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 7607 | 25.6% | |
| 2 | 4784 | 16.1% | |
| 3 | 3864 | 13.0% | |
| 4 | 3915 | 13.2% | |
| 5 | 4997 | 16.8% |
| Value | Count | Frequency (%) | |
| 7 | 206 | 0.7% | |
| 6 | 4300 | 14.5% | |
| 5 | 4997 | 16.8% | |
| 4 | 3915 | 13.2% | |
| 3 | 3864 | 13.0% |
default
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Memory size | 232.0 KiB |
| 0 | |
|---|---|
| 1 | |
| (Missing) | 1 |
| Value | Count | Frequency (%) | |
| 0 | 23121 | 77.9% | |
| 1 | 6552 | 22.1% | |
| (Missing) | 1 | < 0.1% |
| Distinct count | 56 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.501870387220706 |
|---|---|
| Minimum | 21.0 |
| Maximum | 79.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 23 |
| Q1 | 28 |
| median | 34 |
| Q3 | 41 |
| 95-th percentile | 53 |
| Maximum | 79 |
| Range | 58 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 9.232184001 |
|---|---|
| Coefficient of variation (CV) | 0.26004782 |
| Kurtosis | 0.03836190383 |
| Mean | 35.50187039 |
| Median Absolute Deviation (MAD) | 7.560042396 |
| Skewness | 0.7305412748 |
| Sum | 1053447 |
| Variance | 85.23322143 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 29 | 1575 | 5.3% | |
| 27 | 1463 | 4.9% | |
| 28 | 1387 | 4.7% | |
| 30 | 1382 | 4.7% | |
| 26 | 1243 | 4.2% | |
| 31 | 1200 | 4.0% | |
| 25 | 1176 | 4.0% | |
| 34 | 1146 | 3.9% | |
| 32 | 1144 | 3.9% | |
| 33 | 1137 | 3.8% | |
| Other values (46) | 16820 | 56.7% |
| Value | Count | Frequency (%) | |
| 21 | 67 | 0.2% | |
| 22 | 556 | 1.9% | |
| 23 | 916 | 3.1% | |
| 24 | 1118 | 3.8% | |
| 25 | 1176 | 4.0% |
| Value | Count | Frequency (%) | |
| 79 | 1 | < 0.1% | |
| 75 | 3 | < 0.1% | |
| 74 | 1 | < 0.1% | |
| 73 | 4 | < 0.1% | |
| 72 | 3 | < 0.1% |
| Distinct count | 81 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 167330.8961008324 |
|---|---|
| Minimum | 10000.0 |
| Maximum | 1000000.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 232.0 KiB |
Quantile statistics
| Minimum | 10000 |
|---|---|
| 5-th percentile | 20000 |
| Q1 | 50000 |
| median | 140000 |
| Q3 | 240000 |
| 95-th percentile | 430000 |
| Maximum | 1000000 |
| Range | 990000 |
| Interquartile range (IQR) | 190000 |
Descriptive statistics
| Standard deviation | 129876.6449 |
|---|---|
| Coefficient of variation (CV) | 0.7761665531 |
| Kurtosis | 0.5456392649 |
| Mean | 167330.8961 |
| Median Absolute Deviation (MAD) | 105037.984 |
| Skewness | 0.9974503939 |
| Sum | 4965209680 |
| Variance | 1.686794288e+10 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 50000 | 3327 | 11.2% | |
| 20000 | 1963 | 6.6% | |
| 30000 | 1597 | 5.4% | |
| 80000 | 1544 | 5.2% | |
| 200000 | 1495 | 5.0% | |
| 150000 | 1096 | 3.7% | |
| 100000 | 1040 | 3.5% | |
| 180000 | 980 | 3.3% | |
| 360000 | 833 | 2.8% | |
| 60000 | 825 | 2.8% | |
| Other values (71) | 14973 | 50.5% |
| Value | Count | Frequency (%) | |
| 10000 | 488 | 1.6% | |
| 16000 | 2 | < 0.1% | |
| 20000 | 1963 | 6.6% | |
| 30000 | 1597 | 5.4% | |
| 40000 | 230 | 0.8% |
| Value | Count | Frequency (%) | |
| 1000000 | 1 | < 0.1% | |
| 800000 | 2 | < 0.1% | |
| 780000 | 2 | < 0.1% | |
| 760000 | 1 | < 0.1% | |
| 750000 | 4 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
First rows
| df_index | GRADUATE | AGE_GROUP | BILL_AMT5 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | LIMIT_BAL_GROUP | default | AGE | LIMIT_BAL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.0 | 1.0 | 0.0 | 0.0 | 689.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 24.0 | 20000.0 |
| 1 | 1 | 0.0 | 2.0 | 3455.0 | 0.0 | 1000.0 | 1000.0 | 1000.0 | 0.0 | 2000.0 | 3.0 | 1.0 | 26.0 | 120000.0 |
| 2 | 2 | 0.0 | 3.0 | 14948.0 | 1518.0 | 1500.0 | 1000.0 | 1000.0 | 1000.0 | 5000.0 | 2.0 | 0.0 | 34.0 | 90000.0 |
| 3 | 3 | 0.0 | 4.0 | 28959.0 | 2000.0 | 2019.0 | 1200.0 | 1100.0 | 1069.0 | 1000.0 | 1.0 | 0.0 | 37.0 | 50000.0 |
| 4 | 4 | 0.0 | 7.0 | 19146.0 | 2000.0 | 36681.0 | 10000.0 | 9000.0 | 689.0 | 679.0 | 1.0 | 0.0 | 57.0 | 50000.0 |
| 5 | 5 | 1.0 | 4.0 | 19619.0 | 2500.0 | 1815.0 | 657.0 | 1000.0 | 1000.0 | 800.0 | 1.0 | 0.0 | 37.0 | 50000.0 |
| 6 | 6 | 1.0 | 2.0 | 483003.0 | 55000.0 | 40000.0 | 38000.0 | 20239.0 | 13750.0 | 13770.0 | 6.0 | 0.0 | 29.0 | 500000.0 |
| 7 | 7 | 0.0 | 1.0 | -159.0 | 380.0 | 601.0 | 0.0 | 581.0 | 1687.0 | 1542.0 | 2.0 | 0.0 | 23.0 | 100000.0 |
| 8 | 8 | 0.0 | 2.0 | 11793.0 | 3329.0 | 0.0 | 432.0 | 1000.0 | 1000.0 | 1000.0 | 3.0 | 0.0 | 28.0 | 140000.0 |
| 9 | 9 | 0.0 | 3.0 | 13007.0 | 0.0 | 0.0 | 0.0 | 13007.0 | 1122.0 | 0.0 | 1.0 | 0.0 | 35.0 | 20000.0 |
Last rows
| df_index | GRADUATE | AGE_GROUP | BILL_AMT5 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | LIMIT_BAL_GROUP | default | AGE | LIMIT_BAL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 29664 | 29991 | 0.0 | 3.0 | 2500.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 1.0 | 34.0 | 210000.0 |
| 29665 | 29992 | 0.0 | 5.0 | 0.0 | 2000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 43.0 | 10000.0 |
| 29666 | 29993 | 1.0 | 4.0 | 69473.0 | 2000.0 | 111784.0 | 4000.0 | 3000.0 | 2000.0 | 2000.0 | 2.0 | 0.0 | 38.0 | 100000.0 |
| 29667 | 29994 | 0.0 | 3.0 | 82607.0 | 7000.0 | 3500.0 | 0.0 | 7000.0 | 0.0 | 4000.0 | 2.0 | 1.0 | 34.0 | 80000.0 |
| 29668 | 29995 | 0.0 | 4.0 | 31237.0 | 8500.0 | 20000.0 | 5003.0 | 3047.0 | 5000.0 | 1000.0 | 5.0 | 0.0 | 39.0 | 220000.0 |
| 29669 | 29996 | 0.0 | 5.0 | 5190.0 | 1837.0 | 3526.0 | 8998.0 | 129.0 | 0.0 | 0.0 | 3.0 | 0.0 | 43.0 | 150000.0 |
| 29670 | 29997 | 0.0 | 4.0 | 20582.0 | 0.0 | 0.0 | 22000.0 | 4200.0 | 2000.0 | 3100.0 | 1.0 | 1.0 | 37.0 | 30000.0 |
| 29671 | 29998 | 0.0 | 5.0 | 11855.0 | 85900.0 | 3409.0 | 1178.0 | 1926.0 | 52964.0 | 1804.0 | 2.0 | 1.0 | 41.0 | 80000.0 |
| 29672 | 29999 | 0.0 | 6.0 | 32428.0 | 2078.0 | 1800.0 | 1430.0 | 1000.0 | 1000.0 | 1000.0 | 1.0 | 1.0 | 46.0 | 50000.0 |
| 29673 | 30000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |